1,199 research outputs found
Performance Analysis of Output Threshold-Based Incremental Multiple-Relay Combining Scheme with Adaptive Modulation for Cooperative Networks
In this paper, we propose an output threshold-based incremental multiple-relay combining scheme for cooperative amplify-and-forward relay networks with nonidentically distributed relay channels. Specifically, in order to achieve the required performance, we consider both conventional incremental relaying and multiple-relay selection where relays are adaptively selected based on a predetermined output threshold. Moreover, the adaptive modulation technique is adopted by our proposed scheme for satisfying both the spectral efficiency and the required error rate. For the proposed scheme, we first derive an upper bound of the output combined signal-to-noise ratio and then provide its statistics such as cumulative distribution function (CDF), probability density function (PDF), and moment generating function (MGF) over independent, nonidentically distributed Rayleigh fading channels. Additionally, we analyze the system performance in terms of average spectral efficiency, average bit error rate, outage probability, and system complexity. Finally, numerical examples show that our proposed scheme leads to a certain performance improvement in the cooperative networks
Efficient Video Representation Learning via Masked Video Modeling with Motion-centric Token Selection
Self-supervised Video Representation Learning (VRL) aims to learn
transferrable representations from uncurated, unlabeled video streams that
could be utilized for diverse downstream tasks. With recent advances in Masked
Image Modeling (MIM), in which the model learns to predict randomly masked
regions in the images given only the visible patches, MIM-based VRL methods
have emerged and demonstrated their potential by significantly outperforming
previous VRL methods. However, they require an excessive amount of computations
due to the added temporal dimension. This is because existing MIM-based VRL
methods overlook spatial and temporal inequality of information density among
the patches in arriving videos by resorting to random masking strategies,
thereby wasting computations on predicting uninformative tokens/frames. To
tackle these limitations of Masked Video Modeling, we propose a new token
selection method that masks our more important tokens according to the object's
motions in an online manner, which we refer to as Motion-centric Token
Selection. Further, we present a dynamic frame selection strategy that allows
the model to focus on informative and causal frames with minimal redundancy. We
validate our method over multiple benchmark and Ego4D datasets, showing that
the pre-trained model using our proposed method significantly outperforms
state-of-the-art VRL methods on downstream tasks, such as action recognition
and object state change classification while largely reducing memory
requirements during pre-training and fine-tuning.Comment: 15 page
Exploring Chemical Space with Score-based Out-of-distribution Generation
A well-known limitation of existing molecular generative models is that the
generated molecules highly resemble those in the training set. To generate
truly novel molecules that may have even better properties for de novo drug
discovery, more powerful exploration in the chemical space is necessary. To
this end, we propose Molecular Out-Of-distribution Diffusion(MOOD), a
score-based diffusion scheme that incorporates out-of-distribution (OOD)
control in the generative stochastic differential equation (SDE) with simple
control of a hyperparameter, thus requires no additional costs. Since some
novel molecules may not meet the basic requirements of real-world drugs, MOOD
performs conditional generation by utilizing the gradients from a property
predictor that guides the reverse-time diffusion process to high-scoring
regions according to target properties such as protein-ligand interactions,
drug-likeness, and synthesizability. This allows MOOD to search for novel and
meaningful molecules rather than generating unseen yet trivial ones. We
experimentally validate that MOOD is able to explore the chemical space beyond
the training distribution, generating molecules that outscore ones found with
existing methods, and even the top 0.01% of the original training pool. Our
code is available at https://github.com/SeulLee05/MOOD.Comment: ICML 202
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
There has been a significant progress in Text-To-Speech (TTS) synthesis
technology in recent years, thanks to the advancement in neural generative
modeling. However, existing methods on any-speaker adaptive TTS have achieved
unsatisfactory performance, due to their suboptimal accuracy in mimicking the
target speakers' styles. In this work, we present Grad-StyleSpeech, which is an
any-speaker adaptive TTS framework that is based on a diffusion model that can
generate highly natural speech with extremely high similarity to target
speakers' voice, given a few seconds of reference speech. Grad-StyleSpeech
significantly outperforms recent speaker-adaptive TTS baselines on English
benchmarks. Audio samples are available at
https://nardien.github.io/grad-stylespeech-demo.Comment: ICASSP 202
- …